Goto

Collaborating Authors

 environment execution



Adaptive Policy Synchronization for Scalable Reinforcement Learning

Lafuente-Mercado, Rodney

arXiv.org Artificial Intelligence

Scaling reinforcement learning (RL) often requires running environments across many machines, but most frameworks tie simulation, training, and infrastructure into rigid systems. We introduce ClusterEnv, a lightweight interface for distributed environment execution that preserves the familiar Gymnasium API. ClusterEnv uses the DETACH pattern, which moves environment reset() and step() operations to remote workers while keeping learning centralized. To reduce policy staleness without heavy communication, we propose Adaptive Policy Synchronization (APS), where workers request updates only when divergence from the central learner grows too large. ClusterEnv supports both on- and off-policy methods, integrates into existing training code with minimal changes, and runs efficiently on clusters. Experiments on discrete control tasks show that APS maintains performance while cutting synchronization overhead. Source code is available at https://github.com/rodlaf/ClusterEnv.



EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

Weng, Jiayi, Lin, Min, Huang, Shengyi, Liu, Bo, Makoviichuk, Denys, Makoviychuk, Viktor, Liu, Zichen, Song, Yufan, Luo, Ting, Jiang, Yukun, Xu, Zhongwen, Yan, Shuicheng

arXiv.org Artificial Intelligence

There has been significant progress in developing reinforcement learning (RL) training systems. Past works such as IMPALA, Apex, Seed RL, Sample Factory, and others, aim to improve the system's overall throughput. In this paper, we aim to address a common bottleneck in the RL training system, i.e., parallel environment execution, which is often the slowest part of the whole system but receives little attention. With a curated design for paralleling RL environments, we have improved the RL environment simulation speed across different hardware setups, ranging from a laptop and a modest workstation, to a high-end machine such as NVIDIA DGX-A100. On a high-end machine, EnvPool achieves one million frames per second for the environment execution on Atari environments and three million frames per second on MuJoCo environments. When running EnvPool on a laptop, the speed is 2.8x that of the Python subprocess. Moreover, great compatibility with existing RL training libraries has been demonstrated in the open-sourced community, including CleanRL, rl_games, DeepMind Acme, etc. Finally, EnvPool allows researchers to iterate their ideas at a much faster pace and has great potential to become the de facto RL environment execution engine. Example runs show that it only takes five minutes to train agents to play Atari Pong and MuJoCo Ant on a laptop. EnvPool is open-sourced at https://github.com/sail-sg/envpool.